30 research outputs found
Relaxed Spatio-Temporal Deep Feature Aggregation for Real-Fake Expression Prediction
Frame-level visual features are generally aggregated in time with the
techniques such as LSTM, Fisher Vectors, NetVLAD etc. to produce a robust
video-level representation. We here introduce a learnable aggregation technique
whose primary objective is to retain short-time temporal structure between
frame-level features and their spatial interdependencies in the representation.
Also, it can be easily adapted to the cases where there have very scarce
training samples. We evaluate the method on a real-fake expression prediction
dataset to demonstrate its superiority. Our method obtains 65% score on the
test dataset in the official MAP evaluation and there is only one misclassified
decision with the best reported result in the Chalearn Challenge (i.e. 66:7%) .
Lastly, we believe that this method can be extended to different problems such
as action/event recognition in future.Comment: Submitted to International Conference on Computer Vision Workshop
A simple and effective mechanism for stored video streaming with TCP transport and server-side adaptive frame discard
Cataloged from PDF version of article.Transmission control protocol (TCP) with its well-established congestion control mechanism is the prevailing transport
layer protocol for non-real time data in current Internet Protocol (IP) networks. It would be desirable to transmit
any type of multimedia data using TCP in order to take advantage of the extensive operational experience behind TCP
in the Internet. However, some features of TCP including retransmissions and variations in throughput and delay,
although not catastrophic for non-real time data, may result in inefficiencies for video streaming applications. In this
paper, we propose an architecture which consists of an input buffer at the server side, coupled with the congestion control
mechanism of TCP at the transport layer, for efficiently streaming stored video in the best-effort Internet. The proposed
buffer management scheme selectively discards low priority frames from its head-end, which otherwise would
jeopardize the successful playout of high priority frames. Moreover, the proposed discarding policy is adaptive to
changes in the bandwidth available to the video stream.
2004 Elsevier B.V. All rights reserved
AN ABSTRACTION BASED REDUCED REFERENCE DEPTH PERCEPTION METRIC FOR 3D VIDEO
19th IEEE International Conference on Image Processing (ICIP) -- SEP 30-OCT 03, 2012 -- Lake Buena Vista, FLNUR YILMAZ, Gokce/0000-0002-0015-9519; B. Akar, Gozde/0000-0002-4227-5606WOS: 000319334900152In order to speed up the wide-spread proliferation of the 3D video technologies (e.g., coding, transmission, display, etc), the effect of these technologies on 3D perception should be efficiently and reliably investigated. Using Full-Reference (FR) objective metrics for this investigation is not practical especially for "on the fly" 3D perception evaluation. Thus, a Reduced Reference (RR) metric is proposed to predict the depth perception of 3D video in this paper. The color-plus-depth 3D video representation is exploited for the proposed metric. Since the significant depth levels of the depth map sequences have great influence on the depth perception of users, they are considered as side information in the proposed RR metric. To determine the significant depth levels, the depth map sequences are abstracted using bilateral filter. Video Quality Metric (VQM) is utilized to predict the depth perception ensured by the significant depth levels due to its well correlation with the Human Visual System (HVS). The performance assessment results present that the proposed RR metric can be utilized in place of a FR metric to reliably measure the depth perception of 3D video with a low overhead.Inst Elect & Elect Engineers (IEEE), IEEE Signal Proc So